72 research outputs found
Online Learning for Structured Loss Spaces
We consider prediction with expert advice when the loss vectors are assumed
to lie in a set described by the sum of atomic norm balls. We derive a regret
bound for a general version of the online mirror descent (OMD) algorithm that
uses a combination of regularizers, each adapted to the constituent atomic
norms. The general result recovers standard OMD regret bounds, and yields
regret bounds for new structured settings where the loss vectors are (i) noisy
versions of points from a low-rank subspace, (ii) sparse vectors corrupted with
noise, and (iii) sparse perturbations of low-rank vectors. For the problem of
online learning with structured losses, we also show lower bounds on regret in
terms of rank and sparsity of the source set of the loss vectors, which implies
lower bounds for the above additive loss settings as well.Comment: 24 page
Collaborative Learning of Stochastic Bandits over a Social Network
We consider a collaborative online learning paradigm, wherein a group of
agents connected through a social network are engaged in playing a stochastic
multi-armed bandit game. Each time an agent takes an action, the corresponding
reward is instantaneously observed by the agent, as well as its neighbours in
the social network. We perform a regret analysis of various policies in this
collaborative learning setting. A key finding of this paper is that natural
extensions of widely-studied single agent learning policies to the network
setting need not perform well in terms of regret. In particular, we identify a
class of non-altruistic and individually consistent policies, and argue by
deriving regret lower bounds that they are liable to suffer a large regret in
the networked setting. We also show that the learning performance can be
substantially improved if the agents exploit the structure of the network, and
develop a simple learning algorithm based on dominating sets of the network.
Specifically, we first consider a star network, which is a common motif in
hierarchical social networks, and show analytically that the hub agent can be
used as an information sink to expedite learning and improve the overall
regret. We also derive networkwide regret bounds for the algorithm applied to
general networks. We conduct numerical experiments on a variety of networks to
corroborate our analytical results.Comment: 14 Pages, 6 Figure
- …